Exploratorio

Author

Abraham Román Alvarez

Published

Invalid Date

Resumen de la data

La muestra consta de 45,000 observaciones de 14 variables.

  • 5 categoricas
  • 9 numericas

Variables categoricas

skim_type skim_variable n_missing complete_rate character.min character.max character.empty character.n_unique character.whitespace
character person_gender 0 1 4 6 0 2 0
character person_education 0 1 6 11 0 5 0
character person_home_ownership 0 1 3 8 0 4 0
character loan_intent 0 1 7 17 0 6 0
character previous_loan_defaults_on_file 0 1 2 3 0 2 0

Variables numericas

N.Valid Pct.Valid Mean Std.Dev Min Q1 Median Q3 Max IQR
cb_person_cred_hist_length 45000 100 5.867489e+00 3.879702e+00 2.00 3.00 4.00 8.00 30.00 5.00
credit_score 45000 100 6.326088e+02 5.043586e+01 390.00 601.00 640.00 670.00 850.00 69.00
label 45000 100 2.222222e-01 4.157443e-01 0.00 0.00 0.00 0.00 1.00 0.00
loan_amnt 45000 100 9.583158e+03 6.314887e+03 500.00 5000.00 8000.00 12237.50 35000.00 7237.25
loan_int_rate 45000 100 1.100661e+01 2.978808e+00 5.42 8.59 11.01 12.99 20.00 4.40
loan_percent_income 45000 100 1.397249e-01 8.721230e-02 0.00 0.07 0.12 0.19 0.66 0.12
person_age 45000 100 2.776418e+01 6.045108e+00 20.00 24.00 26.00 30.00 144.00 6.00
person_emp_exp 45000 100 5.410333e+00 6.063532e+00 0.00 1.00 4.00 8.00 125.00 7.00
person_income 45000 100 8.031905e+04 8.042250e+04 8000.00 47202.00 67048.00 95789.50 7200766.00 48585.25

Se observa que el dataset cuenta con la variable “loan_status”, la cual funcionara como nuestra variable objetivo (label)

Frecuencia de variables

Frecuencia de “loan_status”

Insights:

La variable loan_status en su clase mas representativa tiene el 77.8% del total, lo cual sugiere un conjunto desbalanceado pero aceptable.

Frecuencia de “person_gender”

Frecuencia de “person_education

Frecuencia de “person_home_ownership”

Frecuencia de “loan_intent”

Frecuencia de “previous_loan_defaults_on_file”

Distribución de variables numericas

Distribución de person_age

Distribución de experiencia laboral del cliente

Distribución de ingreso del cliente

Distribución de monto del cliente

Distribución de score

Distribución de variables numericas respecto a la variable label

Distribución de person_age

Distribución de person_income

Distribución de person_emp_exp

Distribución de loan_amnt

Distribución de loan_int_rate

Distribución de loan_percent_income

Distribución de cb_person_cred_hist_length

Distribución de credit_score

Dispersión de score crediticio en función de la edad del cliente

Test estadisticos

Variabless categóricas

variable Educación del cliente

var cat_val proporcion ic_inf ic_sup n eventos p_value
person_education Associate 0.2203193 0.2129392 0.2278349 12028 2650 0.7328516
person_education Bachelor 0.2252407 0.2181904 0.2324105 13399 3018 0.7328516
person_education Doctorate 0.2286634 0.1961828 0.2637479 621 142 0.7328516
person_education High School 0.2231039 0.2156725 0.2306701 11972 2671 0.7328516
person_education Master 0.2176218 0.2079895 0.2274904 6980 1519 0.7328516

variable Tipo de vivienda

var cat_val proporcion ic_inf ic_sup n eventos p_value
person_home_ownership MORTGAGE 0.1159608 0.1113793 0.1206635 18489 2144 0
person_home_ownership OTHER 0.3333333 0.2488994 0.4264342 117 39 0
person_home_ownership OWN 0.0752287 0.0659677 0.0853421 2951 222 0
person_home_ownership RENT 0.3239773 0.3179874 0.3300109 23443 7595 0

variable Intención del préstamo

var cat_val proporcion ic_inf ic_sup n eventos p_value
loan_intent DEBTCONSOLIDATION 0.3027292 0.2920884 0.3135312 7145 2163 0
loan_intent EDUCATION 0.1695619 0.1619258 0.1774088 9153 1552 0
loan_intent HOMEIMPROVEMENT 0.2630148 0.2505808 0.2757388 4783 1258 0
loan_intent MEDICAL 0.2781937 0.2687126 0.2878262 8548 2378 0
loan_intent PERSONAL 0.2014036 0.1924088 0.2106294 7552 1521 0
loan_intent VENTURE 0.1442640 0.1365458 0.1522484 7819 1128 0

Variabless numéricas

variable monto del préstamo

cuantil var num_rows media_col media_label porcentaje p_value
1 loan_amnt 13131 3477.991 0.2068388 29.18000 0
2 loan_amnt 10100 6756.866 0.1691089 22.44444 0
3 loan_amnt 10519 10342.012 0.2044871 23.37556 0
4 loan_amnt 11250 18536.944 0.3044444 25.00000 0

variable tasa de interés

cuantil var num_rows media_col media_label porcentaje p_value
1 loan_int_rate 11362 7.15566 0.0927654 25.24889 0
2 loan_int_rate 13075 10.29691 0.1665774 29.05556 0
3 loan_int_rate 9316 11.98020 0.1954702 20.70222 0
4 loan_int_rate 11247 14.91554 0.4398506 24.99333 0

variable monto del préstamo en porcentaje del ingreso

cuantil var num_rows media_col media_label porcentaje p_value
1 loan_percent_income 11557 0.0481994 0.1121398 25.68222 0
2 loan_percent_income 11683 0.0992374 0.1317299 25.96222 0
3 loan_percent_income 11354 0.1568249 0.1887441 25.23111 0
4 loan_percent_income 10406 0.2681722 0.4826062 23.12444 0

variable Score

cuantil var num_rows media_col media_label porcentaje p_value
1 credit_score 11265 563.3879 0.2248557 25.03333 0.2935483
2 credit_score 11566 622.4738 0.2257479 25.70222 0.2935483
3 credit_score 11235 655.5263 0.2219849 24.96667 0.2935483
4 credit_score 10934 691.0974 0.2160234 24.29778 0.2935483

variable edad del cliente

cuantil var num_rows media_col media_label porcentaje p_value
1 person_age 15934 22.89162 0.2355968 35.40889 1.6e-06
2 person_age 8166 25.44808 0.2217732 18.14667 1.6e-06
3 person_age 10299 28.33032 0.2151665 22.88667 1.6e-06
4 person_age 10601 36.32205 0.2093199 23.55778 1.6e-06

variable años de experiencia laboral

cuantil var num_rows media_col media_label porcentaje p_value
1 person_emp_exp 13627 0.2980113 0.2383503 30.28222 1e-07
2 person_emp_exp 11548 2.9471770 0.2192587 25.66222 1e-07
3 person_emp_exp 9811 6.3041484 0.2199572 21.80222 1e-07
4 person_emp_exp 10014 14.3319353 0.2059117 22.25333 1e-07

variable añós de historial crediticio

cuantil var num_rows media_col media_label porcentaje p_value
1 cb_person_cred_hist_length 14849 2.559768 0.2319348 32.99778 0.0002366
2 cb_person_cred_hist_length 8653 4.000000 0.2255865 19.22889 0.0002366
3 cb_person_cred_hist_length 11737 6.460680 0.2182841 26.08222 0.0002366
4 cb_person_cred_hist_length 9761 11.841615 0.2091999 21.69111 0.0002366

variable ingreso del cliente

cuantil var num_rows media_col media_label porcentaje p_value
1 person_income 11250 35268.65 0.4037333 25.00000 0
2 person_income 11251 57158.40 0.2205137 25.00222 0
3 person_income 11249 80046.01 0.1721042 24.99778 0
4 person_income 11250 148805.19 0.0925333 25.00000 0